Goto

Collaborating Authors

 analytical case study


Online Learning from Finite Training Sets: An Analytical Case Study

Neural Information Processing Systems

By an extension of statistical me(cid:173) chanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p N, larger learning rates can be used without compromis(cid:173) ing asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay,) at given final learning time, the generalization per(cid:173) formance of online learning is essentially as good as that of offline learning.